Skip to main content
Version: 25.06

Detection Rules Architecture

This overview explains how Content Identification detection rules are structured and how the different components work together to identify and classify sensitive content. Understanding this architecture is essential for creating effective content detection systems.

Component Overview

Content Identification uses a hierarchical structure of components that work together to detect and classify content:

Architecture Layers

Layer 1: Rule Packs

Rule Packs are the top-level containers that organize and package classification rules for deployment.

Key Characteristics:

  • XML files containing one or more classification rules
  • Must have unique ID and version information
  • Include metadata for management and deployment
  • Can contain shared resources used by multiple rules

Layer 2: Classification Rules

Classification Rules define the logic for identifying specific types of sensitive content.

Rule Types:

  • Entity Rules: Detect specific data types (SSN, credit cards, etc.)
  • Evidence Rules: Look for supporting evidence
  • Proximity Rules: Check for related content nearby
  • Affinity Rules: Detect content relationships
  • Similarity Rules: Find similar content patterns
  • Pattern Rules: Match specific text patterns

Layer 3: Policy Elements

Policy Elements evaluate metadata and structural properties of content.

Layer 4: Matching Elements

Matching Elements analyze the actual content for sensitive data patterns.

Data Flow Architecture

The detection process follows a structured data flow:

Evaluation Context

The evaluation context provides the data environment that rules operate within:

Rule Execution Model

Rules are executed using a multi-phase approach:

Phase 1: Pre-filtering

  • Quick metadata checks
  • File type validation
  • Size and format constraints

Phase 2: Content Analysis

  • Text extraction and normalization
  • Pattern recognition
  • Data structure analysis

Phase 3: Rule Evaluation

  • Policy element evaluation
  • Matching element processing
  • Logical condition resolution

Phase 4: Post-processing

  • Confidence calculation
  • Action determination
  • Result aggregation

Integration Points

The detection system integrates with various platform components:

Performance Considerations

Optimization Strategies

  1. Rule Ordering: Most selective rules first
  2. Early Termination: Stop on definitive matches
  3. Caching: Reuse analysis results
  4. Parallel Processing: Concurrent rule evaluation

Scalability Factors

  • Rule Complexity: Simpler rules perform better
  • Content Size: Larger files require more processing
  • Pattern Density: More patterns increase overhead
  • Context Depth: Deeper analysis impacts performance

Best Practices

Rule Design

  • Start with broad patterns, refine for precision
  • Use policy elements to filter before content analysis
  • Combine multiple weak indicators for stronger detection
  • Test rules with representative content samples

Architecture Planning

  • Group related rules in logical rule packs
  • Design for maintainability and updates
  • Consider localization and regional requirements
  • Plan for rule versioning and deployment

Performance Optimization

  • Profile rule performance regularly
  • Monitor false positive/negative rates
  • Optimize frequently-used patterns
  • Balance accuracy with processing speed